Reporting survival analysis results with the {gtsummary} and {ggsurvfit} packages

NYR Conference 2024

Emily C. Zabor

Time-to-event endpoints are common in biomedical research

Common research questions relate to times and probabilities

  • What is the probability of remaining event-free for a certain number of years?
  • What is the average amount of event-free time?

Survival analysis methods needed when follow-up time varies

The {survival} package is the basis of the survival ecosystem in R


  • Began development in 1985
  • Total of 11.9M downloads
  • Active development ongoing
  • Many detailed vignettes covering both the basics and advanced topics
  • Includes the essential methods
install.packages("survival")
library(survival)

Example dataset

The colon data come from a clinical trial of adjuvant therapy for colon cancer, and are included in the {survival} package.


Variables of interest:

Variable Variable description Levels
rx Treatment assignment Obs(ervation), Lev(amisole), Lev(amisole)+5-FU
sex Sex 1=male, 0=female
years Years to status Continuous years
status Censoring status 0=censored, 1=recurrence


See appendix slide for code to generate altered example data colonr.

Estimate median event-free time

Use survfit() to create the survival curves from which the median recurrence-free years and 95% confidence interval are obtained:


survfit(Surv(years, status) ~ 1, data = colonr)
Call: survfit(formula = Surv(years, status) ~ 1, data = colonr)

       n events median 0.95LCL 0.95UCL
[1,] 929    468   5.52    4.01      NA

Estimate x-year event-free probability

Run summary() on the survfit() object to estimate the 5-year recurrence-free probability and 95% confidence interval:


summary(survfit(Surv(years, status) ~ 1, data = colonr), times = 5)
Call: survfit(formula = Surv(years, status) ~ 1, data = colonr)

 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    5    437     451    0.508  0.0166        0.476        0.541

Fit Cox regression models

Use coxph() to fit univariate and multivariable Cox regression models:


coxph(Surv(years, status) ~ rx, data = colonr)
Call:
coxph(formula = Surv(years, status) ~ rx, data = colonr)

              coef exp(coef) se(coef)      z        p
rxLev     -0.01512   0.98499  0.10708 -0.141    0.888
rxLev+5FU -0.51209   0.59924  0.11863 -4.317 1.58e-05

Likelihood ratio test=24.34  on 2 df, p=5.175e-06
n= 929, number of events= 468 

Plot Kaplan-Meier curves

Run plot() on the survfit() object to generate the Kaplan-Meier plot:

plot(survfit(Surv(years, status) ~ 1, data = colonr), 
  xlab = "Years from treatment start",
  ylab = "Recurrence-free probability")

Done! That’s everything we need!



But what if we need to put our results in a reproducible and publication-ready report?



install.packages(c("gtsummary", "ggsurvfit"))
library(gtsummary)
library(ggsurvfit)

Survival tables with {gtsummary}

Use tbl_survfit() to create:

  • Tables of median event-free time
  • Tables of x-time event-free probability

Use tbl_uvregression() to create:

  • Tables of univariate Cox regression results

Use tbl_regression() to create:

  • Tables of multivariable Cox regression results

Tables are highly customizable, see https://www.danieldsjoberg.com/gtsummary/ for details and vignettes to get started

Table of median event-free time

Use tbl_survfit() with the probs = 0.5 argument:

list(
  survfit(
    Surv(years, status) ~ 1, 
    data = colonr),
  survfit(
    Surv(years, status) ~ rx, 
    data = colonr)
  ) |> 
  tbl_survfit(
    probs = 0.5,
    label_header = 
      "Median (95% CI)"
    )
Characteristic Median (95% CI)
Overall 5.5 (4.0, —)
Treatment
    Obs 3.4 (2.2, 5.6)
    Lev 3.2 (2.2, 5.7)
    Lev+5FU — (—, —)
  • Highly customizable
  • Can set labels, titles, subtitles, footnotes, and more

Table of x-year event-free probability

Use tbl_survfit() with the times = 5 argument:

list(
  survfit(
    Surv(years, status) ~ 1, 
    data = colonr),
  survfit(
    Surv(years, status) ~ rx, 
    data = colonr)
  ) |> 
  tbl_survfit(
    times = 5,
    label_header = 
      "{time}-year RFS (95% CI)"
    )
Characteristic 5-year RFS (95% CI)
Overall 51% (48%, 54%)
Treatment
    Obs 45% (40%, 51%)
    Lev 46% (41%, 52%)
    Lev+5FU 62% (56%, 67%)
  • Note the use of {glue} syntax in the label header to dynamically code the column header

Inline results reporting

Save the tbl_survfit() result to an object (here, tab1) so that table statistics can be extracted for use inline with the inline_text() function.


The inline code:

The median (95% CI) recurrence-free time for the Levamisole treatment group is `r inline_text(tab1, variable = rx, level = 'Lev', prob = 0.5)`.


Will show as:

The median (95% CI) recurrence-free time for the Levamisole treatment group is 3.2 (2.2, 5.7).

Univariate Cox regression table

Use tbl_uvregression():

colonr |> 
  tbl_uvregression(
    method = coxph,
    y = Surv(years, status),
    include = c(rx, sex),
    exponentiate = T
    ) |> 
  add_global_p()
Characteristic N HR1 95% CI1 p-value
Treatment 929 <0.001
    Obs
    Lev 0.98 0.80, 1.21
    Lev+5FU 0.60 0.47, 0.76
Sex 929 0.4
    Female
    Male 0.92 0.77, 1.10
1 HR = Hazard Ratio, CI = Confidence Interval
  • Fits models for all variables listed in include =
  • exponentiate = T to obtain hazard ratios
  • Add global p-values

Multivariable Cox regression table

Pass fitted model results directly to tbl_regression():

coxph(
  Surv(years, status) ~ 
    rx + sex, 
  data = colonr
  ) |> 
  tbl_regression(
    exponentiate = T
  ) |> 
  add_global_p()
Characteristic HR1 95% CI1 p-value
Treatment <0.001
    Obs
    Lev 0.99 0.80, 1.22
    Lev+5FU 0.60 0.47, 0.75
Sex 0.3
    Female
    Male 0.90 0.75, 1.08
1 HR = Hazard Ratio, CI = Confidence Interval

Time-to-event plots with {ggsurvfit}

Uses {ggplot2} as the basis so known customizations are available with the + operator.

  • Plot Kaplan-Meier curves using ggsurvfit
  • Plot cumulative incidence curves for competing risks using ggcuminc
  • Plot multi-state models using ggsurvfit

Options to:

  • Add confidence intervals
  • Add risk tables
  • Add quantiles
  • And more

See https://www.danieldsjoberg.com/ggsurvfit/ for details and vignettes to get started

Kaplan-Meier curves with ggsurvfit()

survfit2(
  Surv(years, status) ~ 1, 
  data = colonr
  ) |>
  ggsurvfit() + 
  add_confidence_interval() + 
  add_risktable() +
  scale_y_continuous(
    limits = c(0, 1)
    ) + 
  scale_x_continuous(
    limits = c(0, 8), 
    breaks = seq(0, 8, 2)
  ) +
  labs(
    x = "Years from treatment start",
    y = "Recurrence-free probability"
  )

Put it all together

Thank you, and please get in touch


zabore2@ccf.org

https://www.emilyzabor.com/

https://github.com/zabore

https://www.linkedin.com/in/emily-zabor-59b902b7/



Link to slides: https://github.com/zabore/slidedecks/blob/master/2024-NYR-Zabor.html

Appendix: Create example dataset

  • colon dataset from {survival} altered to:

    • Limit to one row per patient with interest in recurrence rather than death
    • Scale time from days to years
    • Recode sex levels to informative values
    • Label variables of interest
colonr <- 
  colon |> 
  filter(etype == 1) |> 
  mutate(
    years = time / 365.25,
    sex = case_match(
      sex,
      0 ~ "Female",
      1 ~ "Male"
    )
  ) |> 
  labelled::set_variable_labels(
    rx = "Treatment", 
    sex = "Sex"
  )
include-after: |